NetApp: wait for volume to become RW after snapmirror break by Carthaca · Pull Request #299 · sapcc/manila

Carthaca · 2026-03-16T16:25:33Z

During replica promotion two replicas failed with mount errors after
snapmirror break operations. NetApp audit logs showed that the break
commands were issued but remained in "Pending" state while Manila
immediately attempted to mount the volumes. The mounts failed because
the volumes were still DP type - the break operations hadn't completed
yet.

The break_snapmirror method previously assumed break_snapmirror_vol()
was synchronous and the volume would immediately be RW. In practice,
the break operation can take several seconds to complete.

This adds a polling loop after break_snapmirror_vol() that waits for
the volume type to transition from 'dp' to 'rw' before attempting the
mount. The implementation mirrors the existing wait_for_quiesced logic,
using the netapp_snapmirror_quiesce_timeout config with 5-second
intervals.

If the volume doesn't become RW within the timeout, a NetAppException
is raised with details about the timeout and volume name.

sumitarora2786 · 2026-03-16T18:47:05Z

just a thought, snapmirror relationship status can also be checked to be "broken-off"

crenduchinta88 · 2026-03-16T19:20:47Z

Another thought: we could mount the volume while it’s still in DP mode once promote is selected, and then perform the SnapMirror break.

sumitarora2786 · 2026-03-16T20:46:31Z

is it allowed, afai remember the mount was not allowed to DP type vol, the junction path can be applied to once the vol is no more DP, maybe something change in new releases

crenduchinta88 · 2026-03-16T23:29:25Z

is it allowed, afai remember the mount was not allowed to DP type vol, the junction path can be applied to once the vol is no more DP, maybe something change in new releases

Yes, adding junction-path to the DP volume is possible both before and after the initial transfer. However, data access through the junction path is only permitted once the baseline copy has completed.

link

sumitarora2786 · 2026-03-17T02:25:48Z

yes, that was the confusion and I barely remember, during the initial transfer it use to give some warning.
but mount once the baseline is completed should be fine..

@Carthaca , since we offer RO replicas, the mount operation must have been kicked earlier, is it only for the case where the user creates the access rule for the replica/destination?
wondering why the process outlined in https://codi.eu-nl-1.cloud.sap/LMtI1FUzS-iVAiFSTBirfA tries mount after break, maybe update is required to old doc but still curious :)

During replica promotion replicas failed with mount errors after snapmirror break operations. NetApp audit logs showed that the break commands were issued but remained in "Pending" state while Manila immediately attempted to mount the volumes. The mounts failed because the volumes were still DP type - the break operations hadn't completed yet. The break_snapmirror method previously assumed break_snapmirror_vol() was synchronous and the volume would immediately be RW. In practice, the break operation can take several seconds to complete. This adds a polling loop after break_snapmirror_vol() that waits for the volume type to transition from 'dp' to 'rw' before attempting the mount. The implementation mirrors the existing wait_for_quiesced logic, using the netapp_snapmirror_quiesce_timeout config with 5-second intervals. If the volume doesn't become RW within the timeout, a NetAppException is raised with details about the timeout and volume name. Additionally, this optimizes promotion for readable replicas by skipping the mount operation entirely. Readable replicas are already mounted when created (with junction path) since they need to be accessible for read operations. Only DR replicas need mounting after snapmirror break. Change-Id: I2b8f9a1c5d7e3a4f6b9c8d1e2f3a4b5c6d7e8f9a Signed-off-by: Maurice Escher <maurice.escher@sap.com>

Carthaca · 2026-03-19T13:45:54Z

@sumitarora2786 yes, the additional mount is not needed in the readable replica case - I made that optimization to the code.

The both sides DP problem was caused by periodic update trying to "repair" the original snapmirror (create,initialize,resume,resync in NetApp logs), this should be avoided now, too

…tion During replica promotion, the API layer sets the replica status to STATUS_REPLICATION_CHANGE and both promote_share_replica and _share_replica_update use the @locked_share_replica_operation decorator to prevent concurrent execution via a shared lock. However, after a promotion failure, a sequential race can occur: 1. API sets replica status to STATUS_REPLICATION_CHANGE 2. Promote operation acquires lock and starts 3. Promote fails (e.g., mount error after snapmirror break) 4. Exception handler sets both replicas to ERROR status 5. Promote releases lock and exits 6. periodic_share_replica_update acquires lock shortly after 7. Sees both replicas in ERROR, but no snapmirror relationship 8. Attempts to recreate snapmirror based on stale database state 9. Creates relationship in wrong direction (old source -> old dest) This adds two safeguards in _share_replica_update: 1. Explicitly skip replicas with STATUS_REPLICATION_CHANGE status. While the lock prevents concurrent execution during promotion, this provides defense-in-depth and makes the intent explicit. STATUS_REPLICATION_CHANGE is intentionally NOT added to TRANSITIONAL_STATUSES as it has special handling in the Share model's instance property for replica selection ordering. 2. If both the active replica and target replica are in ERROR status, skip the driver update entirely. This prevents automatic "recovery" after failed critical operations that require manual intervention. Without this, periodic updates recreate snapmirror relationships in incorrect directions after failed promotions. The checks are placed in the share manager (not the driver) as they are policy decisions about when to skip automatic operations. Change-Id: I3c7d9b2e8f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c Signed-off-by: Maurice Escher <maurice.escher@sap.com>

Carthaca requested review from chuan137, crenduchinta88, kpawar-sap and sumitarora2786 as code owners March 16, 2026 16:25

Carthaca force-pushed the netapp-wait-for-rw-after-snapmirror-break branch from 6b5f68f to c314414 Compare March 19, 2026 13:37

Carthaca changed the title ~~NetApp: wait for volume to become RW after snapmirror break~~ WIP: NetApp: wait for volume to become RW after snapmirror break Mar 19, 2026

Carthaca marked this pull request as draft March 19, 2026 13:58

Carthaca force-pushed the netapp-wait-for-rw-after-snapmirror-break branch from c314414 to 0a1f256 Compare March 20, 2026 13:30

Carthaca marked this pull request as ready for review March 20, 2026 13:46

Carthaca changed the title ~~WIP: NetApp: wait for volume to become RW after snapmirror break~~ NetApp: wait for volume to become RW after snapmirror break Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NetApp: wait for volume to become RW after snapmirror break#299

NetApp: wait for volume to become RW after snapmirror break#299
Carthaca wants to merge 2 commits intostable/2023.1-m3from
netapp-wait-for-rw-after-snapmirror-break

Carthaca commented Mar 16, 2026

Uh oh!

sumitarora2786 commented Mar 16, 2026

Uh oh!

crenduchinta88 commented Mar 16, 2026 •

edited

Loading

Uh oh!

sumitarora2786 commented Mar 16, 2026

Uh oh!

crenduchinta88 commented Mar 16, 2026 •

edited

Loading

Uh oh!

sumitarora2786 commented Mar 17, 2026 •

edited

Loading

Uh oh!

Carthaca commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Carthaca commented Mar 16, 2026

Uh oh!

sumitarora2786 commented Mar 16, 2026

Uh oh!

crenduchinta88 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sumitarora2786 commented Mar 16, 2026

Uh oh!

crenduchinta88 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sumitarora2786 commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Carthaca commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

crenduchinta88 commented Mar 16, 2026 •

edited

Loading

crenduchinta88 commented Mar 16, 2026 •

edited

Loading

sumitarora2786 commented Mar 17, 2026 •

edited

Loading